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ABSTRACT 

This meta*~analysis explored how measuring student 
progress toward long vs. short*-term goals affects achievement 
outcomes. Twenty-one controlled studies were coded in terms of 
measuring method (toward long- vs. short-term goals) and type of 
achievement outcome (probe-like vs. global achievement test}. 
Analogues to analysis of variance conducted on weighced unbiased 
effect sizes (UES) indicated an interaction: when progress was 
measured toward long-term goals, UESs on global measures were higher 
than on probe-like outcomes; when progress was measured toward series 
of short-term goals, the reverse was true. Implications for special 
education practice are discussed. (Author) 
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Abstract 

This meta-analysis explored how measuring student progress toward long- 
us. short-term goals affects achievement outcomes. Twenty-one 
controlled studies were coded in terms of measuring method (toward 
long- vs. short-term goals) and type of achievement outcome (probe-like 
vs. global achievement test), An^^logues to analysis of variance 
conducted on weighted unbiased effect sizes (UESs) indicated an 
interaction: When progress was measured toward long-term goals, UESs on 
global measures were higher than on probe-like outcomes; when progress 
was measured toward series of short-term goals, the reverse was true. 
Implications for special education practice are discussed. 
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The E-f-fect Measuring Student Progress Toward Long vs. Short-Term Goals; 

A Meta-Analysis 

In special education, commercial norm-re-ferenced ac.iievement tests 
represent the traditional and predominant measurement tool -for generating 
individualized instructional programs and for evaluating the effects of those 
programs (Ysseldyke tc Thurlow, 1984). Despite the prevalence of this measurement 
approach, it increasingly has been criticized (see Tindal et a1., 1985; Ysseldyke 
& Thurlow, 1984). With respect to generating educational programs, critics 
contend that the abilities measured by these instruments frequently lack 
necessary conceptualization (Ysseldyke, 1979), and relatedly that the tests often 
fail to demonstrate adequate psychometric properties (Salvia & Ysseldyke, 1985). 
In terms of program evaluation, critics argue that these measures fail to; (a) 
indicate the extent to which specific educational objectives have been attained 
(Skager, 1971), (b) provide enough alternate forms to permit ongoing progress 
monitoring, (c) sample the domains of interest comprehensively (Zignf^ond & 
Silverman, 1984), and (d) relate to curricular materials (Armbruster, Stevens, & 
Rosenshine, 1977; Jenkins & Pany, 1978). 

In response to these problems, ongoing criterion-referenced, 
curriculum-based assessment (CBA) strategies have been developed. With CBA, 
measurement procedures are designed to match students' program objectives. 
Alternate test forms are drawn directly from curricula specified in objectives 
and are administered at regular intervals during intervention; student progress 
data are evaluated regul arl y wi th reference to the performance criteria specified 
in objectives; and individualized programs are modified as required to insure 
attainment of objectives. Therefore, with CBA, instructional program evaluation 
is ongoing and based in the curriculum; program development is inductive, in 
^^^Q response to the ongoing program evaluation data. 

hfiiinniirnrfTiaaia 
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CBA not only is conceptually stronger than traditional assessment 
strategies. Data also indicate than it represents an e-f-fective alternative 
approach to program development and evaluation, with an average e-f-fect size 
across available controlled studies o-f ,70 <Fuchs tc Fuchs, in press). This 
indicates that, in terms o-f the standard normal curve and an achievement test 
scale with a population mean o-f 100 and standard deviation o-f 15, the use o-f CBA 
to generate and evaluate individualized programs can be expected to raise the 
typical achievement outcome score -from 100 to 110.50, or from the 50th to the 
7<Sth percentile. 

Additionally, the requirements o-f -federal legislation seem to indicate 
the importance o-f CBA: The lEP mandate o-f PL 94-142 requires special educators 
to speci-fy long-t«rm goals, short-term objectives, and assessment procedures -for 
monitoring students' attainment o-f goals and objectives. Assuming that the 
intent oi this legislation was to base goals and objectives in pupils' curricula, 
then the lEP mandate requires a CBA approach to progress evaluation. 

Despite the apparent ef -f ect i veness o-f and seeming necessity -for CBA, it 
remains unclear how practitioners should design CBA procedures to monitor 
students' attainment o-f goals and objectives. One reason -for this lack o-f 
clarity stems -from the lEP mandate, itsel-f, which -fails to speci-fy whether 
student progress should be monitored toward the relatively broad goal statements 
or the more numerous and narrow objectives that typically are generated -for each 
lEP goal. Currently, practitioners can select between two types o-f CBA, one 
-focusing on the attainment o-f long-term goals (CBA-goal) and the other o-f 
short-term objectives (CBA-object i ve) . 

With the CBA-goal approach^ an annual curriculum-based goal is specified 
and a large pool o-f related measurement items is created. From this measurement 
pool, subsets o-f items, or monitoring probes, are drawn randomly (see Fuchs, 
Deno, & Mirkin, 1984). The difficulty level of the monitoring probe remains 
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constant over a long time. Contrastingly, with the CBA-object iue approach, a 
series o-f objectives corresponding to steps within a hierarchical curriculum is 
speci-fied, and a series o-f relatively circumscribed, small pools o-f items are 
created; each o-f which corresponds to a speci-fic objective (see Lindsley, 1971} 
^Jhite & Haring, 1980). The di^^iculty level o^ material on which students are 
measured increases as students master the sequentially-related objectives. 

Both types o-f CBA are ongoing, cri ter lon-re-ferenced, curriculum-based, 
and enjoy strong curricular validity or correspondence between tests and 
programmatic goals and objectives (McClung cited in Popham tc Yalow, 1984). 
However, these systems do di-f-fer conceptually. CBA-object i ve appears to have 
stronger instructional validity or correspondence between tests and instruction 
<Yalow ti Popham, 1984). The monitoring probes -for short-term measurement are 
related directly to current instructional material, so, -for example, i-f an 
instructional intervention is introduction o-f the r-controlled phonics rule, the 
monitoring measure is reading r-controlled words. Alternately, with CBA-goal , 
the monitoring probes are not related to the instructional material. The 
instructional intervention may be introduction o-f the r-controlled phonics rule, 
whereas the monitoring measure may involve oral reading -fluency, accuracy, ana/or 
comprehension on second grade passages. 

Although CCA-object i ve may enjoy stronger instructional validity, 
CBA-goal is advantageous in other respects. It possesses better content validity 
or representation o-f the ultimate desired per-f ormance , i.e., reading 
fluency/comprehension (Yalow & Popham, 1984). Additionally, i ts concurrent 
validit: or correlation with other measures o-f achievement appears to be stronger 
than that of CBA-object ive (Fuchs, 1982). 

The emergent question, and the -focus o-f the current meta-analysis, is how 
well these types o-f ongoing criterion-referenced, curriculum-based assessment 
strategies relate to outcome measures of student achievement. The investigation 
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o-f this question should help practitioners assess the relative merits o-f the two 
types of CBA, and select CBA monitoring procedures that maximize student growth. 

Method 

Search Procedure 

The search -for pertinent studies to include in the meta-analysis 
comprised -four steps. First, employing the Thesaurus o-f Psycholooical Index 
Terms (APA, 1982), multiple descriptors were generated -for key terms. For 
example, student ach i element alternately was represented by "student progress," 
"goal attainment," and "educational e-f-fects." Second, these terms -facilitated a 
computer search o-f three on-line data bases: (a) ERIC, a data base o-f educational 
materials -from the Educational Resources In-formation Center consisting of 
abstracts from Research in Education d Current Index to Journals in Educations 
<b> Comprehensive Dissertation Abstracts; and <c) Psychological Abstracts. 
Third, employing similar key descriptors, a manual search was conducted of five 
educational journals for the years 1973 through 1983. These journals were: 
American Educational Research Journal, Journal of Learnint< Disabilities, 
Journal of Precision Teaching, Journal of Special Education, and Learn ino 
Disabil i ty Qua'-terly, Fourth, the reference sections of relevant papers along 
with identified bibl iographies were explored for additional studies. 

Criteria for Relevant Studies 

A study was considered for inclusion if it employed a control group to 
evaluate the effects of curriculum-based monitoring on academic achievement. Such 
monitoring was defined as curriculum-based data collection that occurred at least 
twice weekly, with decisions concerning the adequacy of programs formulated on an 




individual, not group, basis. Studies were excluded that (a) monitored social 
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behaviors, (b) primarily -focused on the use o-f behavior modification, while 
employing time series to test experimental e-f-fects, <c) provided test -feedback 
only to students, and/or <d) employed college age students as subjects. 

The search yiplded 29 studies that met the criteria established -for 
inclusion. From these studies, 8 were eliminated because of insuf-f icient data 
■for calculat ing me ta-analyt ic statistics. 



Data Extracted -from Each Study 

Data aQoreoation. Guidel ines were established to ensure that each 
relevant t^Jfect was counted only once in analyses. When an e-f-fect was measured 
by dif-ferent instruments or by subtests that -failed to represent dimensions 
relevant to the meta-anal/sis (i.e., Reading Comprehension and Structural 
Analysis Subtests o-f the Stan-ford Diagnostic Reading Test), results -from the 
instruments or subtests were pooled* For example, i-f achievement within a study 
were measured with three global tests and two probe-like measures, the three 
e-f-fect sizes -for the global tests would be averaged as would be done -for the two 
probe-like tests. So, two, rather than -five, e-f-fect sizes would be included -for 
such a study. 

De-finition and calculation o-f e-f-fect size. Results o-f the studies were 
trans-formed to a common metric, e-f-fect size, de-fined here as the di-f-ference 
between the treatment means, divided by the control group standard deviation. 
For purposes o-f analysis, an e-f-fect was given a positive sign i-f subjects 
achieved greater scores in the systematic monitoring treatment. For studies 
reporting relevant means and standard deviations -for both groups, e-f-fects sizes 
were calculated -from these measurements. For studies not reporting means and 
standard deviations, e-f-fect sizes were calculated -from other statistics, such as 
F or £ values (see Glass, McGaw, & Smith, 1981). 
T-ri^r- ^^^^ e-f-fect size was converted to an unbiased e-f-fect size (UES) to 
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correct -for inconsistency in estimating true- -from observed e-f-fect sizes <Hedges, 
1981). The di-f-ference between the obsei ^ed and unbiased e-f-fect sizes was 
neglible <X = .019, SD = ,025) as has been demonstrated elsewhere 
<Bangert-Drowns, Kulik, 4 Kulik, 1983). Nevertheless, UESs were employed to 
insure the mathematical tractability o-f the data. 

There were 96 e-f-fect sizes, with between 1 and 12 e-f-fect sizes per study. 
Analyses indicated no statistical dependency between e-f-fect size magnitude and 
number o-f comparist^ns per study < r -=,12 ). There-fore, UESs were aggregated at 
the individual e-f-fect size level. In combining these UESs, weighted averages 
were calculated to account -for the variances o-f the UESs (see Hedges, 1984). 

Study -features. To describe study -features pertinent to the current 
investigation, two major substantive variables were identi-fied and coded -for each 
study. The -first study -feature was type of type o-f goal , This variable had two 
levels that di-f-ferent iated studies in which progress toward long-term goals 
(CBA-goal) was monitored irm studies in which progress toward a short-term 
objective or a series o-f short-term objectives <CBA-object i ve) was monitored. 

Studies in which progress toward lung-term goals was monitored involved 
the speci-f ication o-f a level o-f material on which a student was expected to be 
pro-ficient within the next 15 or more weeks. For example, -for a student 
currently reading pro-f iciently on primer material, a student's goal might speci-fy 
that, in 25 weeks, a student would read 75 words per minute correct with 90X 
accuracy on second grade reading passages. Then, -for the next 25 weeks, 
measurement probes would be randomly sampled from the second grade reading 
passages, representing approximately equivalent samples of measurement material. 

Studies in which progress toward short-term goals was monitored required 
the identification of a sequence of small segments in a hierarchical curriculum 
to be mastered by the student. For example, the series of objectives might 
specify that the student would read, with 90X accuracy, flashcards first with 
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consonant-vowol-consonant words, second with e words, and thiru with double 

vowel words. Proceeding in a -fashion parallel to the specification o-f 
objectives, measurement probes -first would be drawn -from -flashcards with 
consonant-vowel-consonant words until the mastery criterion was achieved by the 
student on that domain. Thet\, the measurement domain would change so that probes 
we."<^ -flashcards with -final e words, and so on. 

The second study -feature was outcome measure. This variable also had two 
levels; dependent measures similar to the monitoring probes' and more global 
achievement teats. Employing the examples provided above, probe-like outcome 
indices wer? oral reading rate on second grade passages or percentage read 
correctly ^rom -flashcards with -final e words; global achievement tests were the 
Structural Analysis and Reading Comprehension Subtests o-f the Stan-ford Diagnostic 
Reading Test. 

In addition to these two substantive -features, a third, methodological 
variable was coded -for each study, duration o-f the treatment. This variable had 
three levelss treatments implemented -for less than 3 weeks (coded M"); 
treatments lasting between 3 and 10 weeks (coded "2"); and treatments continued 
-for more than 10 weeks (coded ''S**). A previous investigation (Fuchs & Fuchs, in 
press) explored methodological quality of the studies and identified no relation 
between effect size magnitude and study quality. 

Two raters independently coded 10 of the 21 studies (48>^). Percentage of 
agreement^ for the raters on type of goal, outcome measure, and duration of 
treatment, respectively, was .90, .80, and 1.00. 

Characteristics of the Sample 

Of the 23 references listed in the Appendix, which represent 21 separate 
2 

investigations, there are 4 dissertations, 11 unpublished studies, and 8 journal 
gl^Q articles. Among the published papers, 3 appeared in Exceptional Children, 2 in 
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Affiertcan Educational Research Journal, and 1 each in Teaching Exceptional 
Children, American Journal of Mental De-ficiencVt and Journal of Precision 
Teaching. A total of 3835 subjects participated in these studies, with 83X if 
the investigations employing handicapped subjects. Of these handicapped pupils, 
98% were mildly to moderately handicapped and 71/. were severely handicapped. The 
grade level of these subjects ranged fror. preschool through high school, with a 
median grade level of 3.8. Among the 21 investigations, 8 <38%) focused solely 
on the academic area of reading, 4 <i9>i) on reading ^:xnd math, 3 <14%) only on 
math, and 1 <5X) each on i.^^ high school content areas, <b) preschool skills, (c) 
spelling, <d) math and spelling, <e) reading, math, and spelling, and <f) 
writing, math, and spelling. 



Resul ts 
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Of the 96 effect sizes, 27 related to long-term goal measurement and 69 
to short-term goal measurement. Of the 27 long-term goal effect sizes, 14 were 
associ ated wi th proble-like and 13 with global outcome measures. Of the 69 
short-term goal effect sizes, 37 were related to probe-liKe and 32 to global 
outcome measures. 

Relation b?tween treatment duration and other effect size features. A 
pair of t, tests was run to determine whether measurement goal or outcome measure 
was related to the duration of treatment. These tests indicated no statistically 
significant associations. For the long-term goal effect sizes, the mean coded 
level of treatment duration <see above) was 2.92 <SD = .27); for the short-term 
goal effect sizes, 2.75 <SD = .46), i <95) = 1.81, ns^ The average level of 
treatment duration for effect sizes associated wl th probe-like and global outcome 
measures, respectively, were 2.78 <SD = .51) and 2.76 <SD = .23), i <95) = .24, 
ns. The absence of a relation between treatment duration and type of measurement 

u 



Heasuring Student-ll 

goal or dependent measure permits a relatively straightforward interpretation of 
the analyses presented below. 

Relation between e-f-fect size magnitude and ^-P-fect stze -features. Table 1 
displays the weighted UESs by <a) the type o-f goal factor (long-term goal vs. 
short-term objective) and <b) the outcome measure -factor <probe~liKe vs. alobal 
achievement te^^t). To examine the relation between these variables and e-f-fect 
^ize magnitude^ Hedges's <1984) analogue to analysis o-f variance was employed. 
When conventional analysis o+ variance is conducted on e-f-fect sizes, problems 
euist because o-f the possibility that systematic variance will be pooled into the 
estimate o-f error variance. Moreover, violation o-f the homoscedastic i ty 
assumption is severe in research .synthesi s, and there is little reason to believe 
that the usual robustness o-f the F test will prevail (see Hedges, ]984). Thus, 
Hedges's analogue to anal/sis o-f variance was employed to avoid these conceptual 
and statistical problems. As indicated in Table 1 neither -factor produced a 
statisticcPy signi-ficant di-f-ference in the UESs. 



Insert Table 1 about here 



Nevertheless, additional analyses suggested the presence o-f an 
interaction between type of goal and outcome measure. Specifically, the effect 
of the type of outcome measure was analyzed within each of the type of goal 
conditions. As shown in Table 2 and Figure 1, within the type of goal 
conditions, there were statistically significant differences between UESs 
associated wi th the proble-like and the global outcome measure"*. With 
CBA-object i ve , UESs associated with probe-ltke outcome measures were higher than 
those of global measures. For CBA-goal, the reverse was trues UESs associated 
gfy(];" with global measures were higher than those related to probe-like outcome 



measures. 
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Insert Table 2 and Figure 1 about here 



Discussion 

The purpose o-f this meta-analysis was to investigate how well measuring 
progress toward long- vs. short-term goals relate to contrasting outcome measures 
o-f student achievement. Toward this end, a literature search was conducted, 
resulting in the identification o-f 21 relevant studies that provided su-f-ficient 
in-formation -for the calculation o-f meta-analytic statistics. These studies were 
coded -for long-term vs. short-term goal measurement and tor probe-like vs. global 
outcome achievement measures. To investigate a possible con-found inherent in 
such a study, that short-term and long-term goal measurement or probe-like and 
global achievement measures might be related to the duration o-f the experimental 
treatment, study durations also were coded. Analyses indicated no reliable 
association between either substantive variable and treatment duration. 

Analgues to analysis o-f variance indicated that the magnitude o-f e-f-fect 
size was not related either to the type o-f outcome measure employed or to the 
type o-f goal on which monitoring occurred. However, additional analyses 
suggested an interaction: When progress was measured toward long-term goals, 
e-f-fect sizes on global outcome measures were higher than on probe-like outc*i-7jes. 
On the other hand, when progress was measured toward series o-f short-term goals, 
e-ffect sizes were greater on probe-like than on global outcome measures. 

This -finding may be explained in terms o-f the types o-f validity 
associated wi th the di-f-fer&nJ goal measurement strategies. With long-term goal 
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measureii^ent , instructional validity is relatively poor whereas content and 
concurrent validity may be comparatively strong. For example, a student might be 
measured, over a year-long period, on oral reading rate and accuracy in material 
one year above instructional level. Such measures clearly are unrelated to 
instructional activities, but have been shown to correlate well with global 
measures o-f reading skills, including tests o-f decoding, word recognition, and 
comprehension (Deno, MirKIn, tc Chiang, 1982; Fuchs, 1981), There-fore, it is not 
surprising that, in this study, measuring student progress toward long-term goals 
was associated more strongly with global achievement outcome measures than to 
more narrow measurement probes. 

On the other hand, with short-term goal measurement, instructional 
validity is relatively high while content and concurrent validity may be 
comparatively limited. For example, a student might be measured, over a 
year-lcng period, on a series o-f short-term objectives, each o-f which is related 
clearly to the current instructional material. However, Quilling and Otto <1971> 
-found that mastery o-f a hierarchy o-f reading decoding skills related 
inconsistently to global indices o-f reading achievement. There-fore, it is not 
surprising that, in this investigation, measuring student progress toward 
short-term goals was associated more strongly with p2rformance measures similar 
to the probes on which monitoring occurred than with global achievement measures. 

Teachers may pre-fer short-term goal measurement because it is easier to 
understand and it guides . struction more directly by providing in-formation about 
when to progress -from one skill to another (Fuchs, Wesson, Tindal, MirKin, & 
Deno, 1982). Nevertheless, as demonstrated in this meta-analysis, short-term 
goal measurement may be misleading: While students master a series o-f 
instructional objectives, progress on more global indices o-f achievement may be 
limited, -failing to reflect this gain. Additionally, practl tione-^s freqently 

14 
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specl-ficy long lists o-f short-term objectives on lEPs <Gi 1 1 «sp ie-Si 1 ver , 
Schachter, & Warren, 1980), a phenomenon that can make short-term objective 
measurement more cumbersome and time-consuming than long-term goal measurement; 
Short-term objective monitoring may require teachers to adapt measurement probes 
and procedures more o-ften. 

The -finding that long-term goal monitoring relates better to global 
achievement outcome measures may be especially important in the education of 
handicapped students, who typically have poorly developed strategies for 
maintaining and transfering skills (Anderson-Inman , Walker, 4c Purcell, 1984; 
White, 1984). Short-term goal measurement focuses on instruct ional ly related, 
relatively restricted domains of material for a period of time and then, upon 
mastery of that material, the measurement and instructional focus simultaneously 
changes. Such a paradigm may be problematic for at least two reasons. First, a 
close connection .between instruction and measurement may encourage teachers to 
present new skills to students within the framework of the measurement task. For 
example, if the measurement procedure requires the pupil to read 
consonant-vowel --consonant words from a list, the teacher may focus instruction on 
reading consonant-vowel-consonant words from a list. As noted by Goodstein 
<1982), there may be danger in tying the instructional format too closely to the 
assessment device or of narrowly defining content-x-f ormat domains of 
criterion-referenced assessment. Such a restricted instructional format may 
limit the transfer of skills. 

Second, simultaneously changing instructional focus and measurement 
domain may fail to encourage teachers to review material sufficiently to allow 
for long-term skill maintenance and generalization. A more global, long-term 
goal approach to measurement , which still is rooted in the curriculum and is 
criterion-referenced, may encourage teachers to encorporate instructional 
procedures that better allow for maintenance and generalization of skills. 
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Findings may be relevant not only to the development o-f systematic, 
continuous progress evaluation procedures but also to teachers' less -formal 
monitoring strategies, including the periodic use o-f commercial 
cri ter ion-re-ferenced measures such as basal stries mastery tests and the Brigance 
<1978). Data generated -from periodic administrations o-f such instruments, where 
test domains are tied closely and narrowly limited to the instructional -focus, 
may -fail to relate to global academic progress. There-fore, teachers might exert 
caution as they interpret such data bases. 

Finally, a summative comment seems warranted. As practitioners develop 
their programmatic or lEP goal and objective statements and their related 
curriculum-based assessment procedures -for monitoring pupil progress toward those 
goals and objectives, it seems important -for them to keep in mind the distinction 
between curricular and content validity, Curricular validity re-fers to the match 
between testing and lEP goals and objectives; content validity, the 
correspondence between testing and the true domain in which pro-ficiency is 
desired (Yalow & Popham, 1983), It is only when prac t i t i oners wr i te "signi-f icant 
rather than trivial" (Popham et a1,, 1985) lEP goals and objectives, which relate 
well to the true desired outcome per-formance , that curricular and content 
validity o-f curr i u1 um-based assessment are both strong. It is only under these 
conditions that "measurement-driven instruction" (Popham et a1,, 1985), or 
ongoing assessment o-f pupils progress to guide instructional planning, has an 
important, global e-f-fect on pupil achievement. This, together with -findings o-f 
the current meta-analysis, suggest that curriculum-based assessent o-f long-term 
goals, which accurately re-flect the desired outcome per-formance^ may represent a 
better strategy -for monitoring pupil progress than the assessment o-f narrowly 
circumscribed short-term objectives. 
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Footnotes 

Percentage of agreement was calculated using the -following formula 
(Coulter cited in Thompson, White, & Morgan, 1982): Percentage of agreement = 
agreements between observer A & observer B/(agreements between A B + 
disagreement between A & B + omissions by A + omissions by B. 

2 

One paper authored by Haring (1971) and two additional reports by Haring 
& Krug <1975a, 1975b) described aspects of the same investigation. Therefore, 
although it is reported that 21 studies were employed in the meta-analysis, 23 
appear in the Appendix due to the separate listing of the Haring and the Haring 
and Krug papers* 
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Table 1 



Weighted Mean UESs, 


2 Val ues, and 


Chi-£quare Statistics as Analogues to 


Analysis o-f Variance by Type 


o-f Goal and 


Outcome Measure Factors 




Factor 


Weighted X 


2. Val ue° 


n'' X*^ df 


Type o-f Goal 






96 .69 I 


Long-term 


.63 


16.58 


27 


Short-term 


,67 


24.82 


69 


Outcome Measure 






96 6.63 1 


Probe-1 i ke 


,72 


23.23 


45 


Global 


.61 


19.06 


51 



A sign-ficant 2 value irdicates that the weighted mean is reliably di-f-ferent 
-from zero. All 2 values are signi-ficant beyond the .001 level. 

b 

N represents number o-f UESs not number o-f studies. 
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Table 2 

Weighted Mean UESs, 2 Values, and Chi-Square Statistics as Analogues to 
Analysis o^ Variance for Probe-Like and Global Outcome Measures within 

Type o-f Goal Conditions 



Type o-f Goal/ 

CI I) 

Outcome Measure Weighted X 2 Value N X*" 



Short-Term Goal 










Outcome Measure 






69 


56. 78^ 1 


Probe-Like 


.85 


22.97 


37 




Global 


.45 


11 .54 


32 




Long-Term Goal 










Outcome Measure 






27 


41.59^ 1 


Probe-Li ke 


.41 


7.32 


14 




Global 


.92 


16.73 


13 




a 

A significant z. value indicates 


that the we 


ighted mean is 


rel i 


ably different 


•from zero. All 2 values are s 

b 


igni-f icant 


beyond the .001 


probabil i ty 1 evel 


N represents number o-f UESs not 


number of 


studies. 







' £ < .001 . 
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Figure Caption 

F'QUf'e 1 . Unbiased mean e-f-fect sizes <UESs> for CBA-objecti ve ( ) 

and CBA-goal < ) on probe-like and global outcome measures. 
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